Lazy Decomposition for Distributed Decision Procedures 



Youssef Hamadi 



Joao Marques-Silva 



Christoph M. Wintersteiger 



Microsoft Research 
Cambridge, UK 

youssef h@microsof t . com 



University College 
Dublin, IE 

jpmsOucd. ie 



cwinter@microsof t . com 



Microsoft Research 
Cambridge, UK 



The increasing popularity of automated tools for software and hardware verification puts ever increas- 
ing demands on the underlying decision procedures. This paper presents a framework for distributed 
decision procedures (for first-order problems) based on Craig interpolation. Formulas are distributed 
in a lazy fashion, i.e., without the use of costly decomposition algorithms. Potential models which 
are shown to be incorrect are reconciled through the use of Craig interpolants. Experimental results 
on challenging propositional satisfiability problems indicate that our method is able to outperform 
traditional solving techniques even without the use of additional resources. 

1 Introduction 

Decision procedures for first-order logic problems, or fragments thereof, have seen a tremendous increase 
in popularity in the recent past. This is due to the great increase in performance of solvers for the 
propositional satisfiability (SAT) problem, as well as the increasing popularity of verification tools for 
both soft- and hardware which extensively use first-order decision procedures like SAT and SMT solvers. 

As the decision problems that occur in large-scale verification problems become larger, methods 
for distribution and parallelization are required to overcome memory and runtime limitations of mod- 
ern computer systems. Frequently, computing clusters and multi-core processors are employed to solve 
such problems. The inherent parallelism in these systems is often used to solve multiple problems con- 
currently, while distributed and parallel decision procedures would allow for much better performance. 
This has led to the development of distributed verification tools, for example through the distribution of 
Bounded-Model-Checking (BMC) problems (see, e.g., ll9l[T5l). 

In the following sections we present a general method for distributed decision procedures which 
is applicable to decision procedures of first-order fragments. The key component in this method is 
Craig's interpolation theorem ifTTTl . This theorem enables us to arbitrarily split formulas into multiple 
parts without any restrictions on the nature or size of the cut. We propose to use this lazy formula 
decomposition because it does not require analysis of the semantics of a formula prior to the distribution 
of the problem. In many other distributed algorithms, such a lazy decomposition clearly has a negative 
impact on the overall runtime of the decision procedure. However, when using an interpolation scheme, 
the abstraction provided by the interpolation algorithm is often strong enough to counterbalance this. 

Through an experimental evaluation of our algorithm on propositional formulas, we are able to show 
that, at least in the propositional case, large speed-ups result from using a lazy decomposition when using 
a suitable interpolation algorithm. 

2 Background 

We are interested in satisfiability of first-order formulas. Formulas are assumed to be in conjunctive 
normal form, i.e., of the form (j> = (xi , . . . ,x n ) A ... A y m (x\ ,...x n ), where Vq = {xi, . . . ,x n } are the 
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free variables of (j) and the yi are clauses (disjunctions of atoms). 

Craig's interpolation theorem provides a way to characterize the relationship between two formulas 
when one implies the other: 

Theorem 1 (Craig Interpolation ifTTIO . Let and y be first-order formulas. If (j) y then there exists 
an Interpolant I such that <p => 1 A I => y and Vi C Vq D V^. 

Equivalently, there is an interpolant I such that 0=^/A/=^-il/f whenever A y/~ is unsatisfiable, 
because =>• -il/A = A y/"). Craig's theorem guarantees the existence of an interpolant, but does not 
provide an algorithm for obtaining it. However, such algorithms are known for many logics. We refer to 
two interpolation algorithms for propositional logic and to describe them, we require some definitions: 

A literal / is either a propositional variable x or its negation -a. A clause is a disjunction of liter- 
als, denoted {h,... ,/„}. A formula (j) is assumed to be in conjunctive normal form (CNF), i.e., it is a 
conjunction of clauses, denoted (j> = {c\, . . . ,c„}. 

A (partial) assignment a is a consistent set of clauses of size 1. The (propositional) SAT problem is 
to determine whether for a given formula <p there exists a total assignment such that A a = T. Given 
two clauses of the form C\ U {x} and C2 U their resolution is defined as C\ UC2 and if it is not 

tautological the result is called a resolvent and the variable x is called the pivot variable. A resolution 
refutation is a sequence of resolution operations such that the final resolvent is empty, proving that the 
formula is unsatisfiable. 

There are multiple techniques for propositional interpolation. Here, we refer to two popular systems, 
one by McMillan tMk and the other by Huang, Krajfcek and Pudlak H20II24U30I1 (HKP). Both are methods 
that require time linear in the size of the resolution refutation of A Both interpolation methods 
construct interpolants by associating an intermediate interpolant with every resolvent in the resolution 
refutation of A y. The interpolant associated with the final resolvent (the empty clause) constitutes an 
interpolant I for which Theorem Q] holds. For a characterization of these and other interpolation systems 
see, e.g., 03. 

McMillan's interpolation system associates with every clause C in the intermediate interpolant 
C\ {v, — iv|v € Vy}, i.e., the restriction of C to the variables in y. Every clause in y is associated with 
the intermediate interpolant T. Every other interpolant is calculated depending on the corresponding 
resolution step. Consider the derivation of resolvent R from clauses C\ and C2 with pivot variable x, 
where C\ and C2 have previously been associated with intermediate interpolants 7c, an d Ic 2 ■ The resulting 
clause R is then associated with the intermediate interpolant 



In the HKP system, every clause in (j) is associated the intermediate interpolant _L, while the clauses 
in y are associated T. Every resolvent R obtained from clauses C\ and C2 with pivot variable x is 
associated with the intermediate interpolant 



For every propositional interpolation system that computes an interpolant for =>■ y, the dual system 
is defined by the computation of an interpolant for I for y 0, with the effect that -1/ is an interpolant 
for =5- y. It is known that the HKP system is self-dual and that McMillan's system is not lTT3l . 





7 Cl V/c 2 ifxeVfAx&Vy 
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3 Related Work 



Our work is most closely related to parallel and distributed decision procedures. Many decision proce- 
dures exists for the prepositional satisfiability and some of them exploit parallelism. 

3.1 Parallel SAT Solving 

In parallel SAT, the objective is to simultaneously explore different parts of the search space in order to 
quickly solve a problem. There are two main approaches to parallel SAT solving. First, the classical 
concept of divide-and-conquer, which divides the search space into subspaces and allocate each of them 
to sequential SAT solvers. The search space is divided thanks to guiding-path constraints (typically unit 
clauses). A formula is found satisfiable if one worker is able to find a solution for its subspace, and 
unsatisfiable if all the subspaces have been proved unsatisfiable. Workers usually cooperate through a 
load balancing strategy which performs the dynamic transfer of subspaces to idle workers, and through 
the exchange of conflict-clauses II 1 Oil 1611 . 

In 2008, Hamadi et al. |[T8l[T9l introduced the parallel portfolio approach. This method exploits 
the complementarity between different sequential DPLL strategies to let them compete and cooperate 
on the original formula. Since each worker deals with the whole formula, there is no need for load 
balancing, and the cooperation is only achieved through the exchange of learnt clauses. Moreover, the 
search process is not artificially influenced by the original set of guiding-path constraints like in the first 
category of techniques. With this approach, the crafting of the strategies is important, and the objective 
is to cover the space of the search strategies in the best possible way. 

The main drawback of parallel SAT techniques comes from their required replication of the formula. 
This is obvious for the parallel portfolio approach. It is also true for divide-and-conquer algorithms 
whose guiding-path constraints do not produce significantly smaller subproblems (only log2C variables 
have to be set to obtain c subproblems). This makes these techniques only applicable to problems which 
fit into the memory of a single machine. 

In the last two years, portfolio-based parallel solvers became prominent and it has been used in SMT 
decision procedures as well [34). We are not aware of a recently developed improvements on the divide- 
and-conquer approach (the latest being ifToll ). We give a brief description of the parallel solvers qualified 



• In plingeling j6), the original SAT instance is duplicated by a boss thread and allocated to 
worker threads. The strategies used by these workers are mainly differentiated around the amount 
of pre-processing, random seeds, and variables branching. Conflict clause sharing is restricted to 
units which are exchanged through the boss thread. This solver won the parallel track of the 2010 
SAT Race. 

• ManySAT lfl9l was the first parallel SAT portfolio. It duplicates the instance of the SAT problem to 
solve, and runs independent SAT solvers differentiated on their restart policies, branching heuris- 
tics, random seeds, conflict clause learning, etc. It exchanges clauses through various policies. 
Two versions of this solver were presented at the 2010 SAT Race, they finished second and third. 

• In SArTagnan, ll22l different SAT algorithms are allocated to different threads, and differentiated 
with respect to restart policies and VSIDS heuristics. Some threads apply a dynamic resolution 
process |[5l6l or exploit reference points [231] . Some others try to simplify a shared clauses database 
by performing dynamic variable elimination or replacement. This solver finished fourth. 

|http : //baldur . iti .uka. de/sat-race-2010 
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• In Antom |[32l , the SAT algorithms are differentiated on decision heuristic, restart strategy, conflict 
clause detection, lazy hyper binary resolution ||5]|6l, and dynamic unit propagation lookahead. 
Conflict clause sharing is implemented. This solver finished fifth. 

3.2 Distributed SAT Solving 

Contrary to parallel SAT, in distributed SAT, the goal is to handle problems which are by nature dis- 
tributed or, even more interestingly, to handle problems which are too large to fit into the memory of 
a single computing node. Therefore, the speed-up against a sequential execution is not necessarily the 
main objective, and in some cases (large instances) cannot even be measured. 

To the best of our knowledge, the only relevant work in the area presents an architecture tailored 
for large distributed Bounded Model Checking lfl5l . The objective is to perform deep BMC unwindings 
thanks to a network of standard computers, where the SAT formulas become so large that they cannot 
be handled by any one of the machines. This approach uses a master/slaves topology, and the unrolling 
of an instance provides a natural partitioning of the problem in a chain of workers. Each worker has to 
reconcile its local solution with its neighbors. The master distributes the parts, and controls the search. 
First, based on proposals coming from the slaves, it selects a globally best variable to branch on. From 
that decision, each worker performs Boolean Constraint Propagation (BCP) on its subproblem, and the 
master performs the same on the globally learnt clauses. The master maintains the global assignment, and 
to ensure the consistency of the parallel BCP algorithms propagates to the slaves Boolean implications. 
The master also records the causality of these implications which allows him to perform conflict-analysis 
when required. 

3.3 Interpolation 

McMillan's propositional interpolation system |[26l when employed in a suitable Model Checking al- 
gorithm, has been shown to perform competitively with algorithms based purely on SAT-solving, i.e., 
McMillan showed that the abstraction obtained through interpolation for Model Checking problems is at 
least as good as and sometimes better than previously known abstraction methods. 

4 Lazy Decomposition 

When considering distributed decision procedures, it is usually assumed that the formulas which are 
to be solved are too large to be solved on a single computing node. Under this premise, strategies for 
distributing a formula have to be employed. If there exists a quantifier elimination algorithm for the 
fragment considered, then it is straight-forward, but comparatively expensive to distribute the problem: 
Find sparsely connected partitions of the formula and eliminate the connections such that the partitions 
become independent. For example, let formula (j) = (pi A <p2 where the partitions (pi and <p2 overlap on 
variables X = D V^. The elimination of X from 3X . (pi A 02 produces two independent parts (p[ and , 
which, respectively, depend on variables V ( j )l \ X and \ X and therefore can be solved independently. 
While this distribution strategy is quite simple, it depends on the existence of a quantifier elimination 
algorithm. Furthermore, the performance of such an algorithm in practice depends greatly on the fact 
that the problem is sparsely connected, which is not generally a given. We therefore use a different and 
cheaper method for distribution: 

Definition 1 (Lazy Decomposition). Let <p be in conjunctive normal form, i.e., <p = (pi A . . . A <p n . A 
lazy decomposition of (p into k partitions is an equivalent set of formulas ■ ■ ■ ,Yk} such that each 
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Input : Formula (j> 

Output T if is satisfiable, _L otherwise 

Yl,-.-,Yk := decompose (<j>); 
G:=T; 
flag := true; 
while flag do 
if G = 1 then 
return _L; 

else 

Let m be a total model for G; 

end 

flag := false; 
foreach i in 1 ... k do 
if l//, A m ee _L then 

Let 7 be an interpolant for -n^Am) over n Vg; 

G := GA7; 

//ag := true; 

end 

end 

end 

return T; 

Algorithm 1: A reconciliation algorithm. 



Yi is equivalent to some conjunction of clauses from 0, i.e., there exist a,b (a < b <n), such that 
\j/i = (j) a A...A <j) b . 

We call this a lazy decomposition, because no effort is made to ensure that partitions do not share 
variables. The formulas Yi---¥k ma y then be solved independently, but if the partitions happen to share 
variables, i.e., when V Vi flVy. ^ for some j ^ i, then these (potentially global) solutions have to be 
reconciled. 

Let Syrj = V; m iJ be the set of all models satisfying l//,- and let S = {S Vl S Vk } be a set of all models 
of all partitions. The reconciliation problem is to determine whether S Vl A ... AS^ is satisfiable, i.e., to 
determine whether there is a global model in which has a matching extension in each 5^. Clearly, any 
set of models is not required to be any smaller in representation than its corresponding partition; in fact, 
it may be exponentially larger. In practice it may therefore be more efficient to build the solution sets 
incrementally, avoiding any blowup wherever possible. To this end, we require the following lemma: 

Lemma 1. Let (f> = y/j A . . . A % and let mbe a model for the shared variables V := Uy=i ^V/ ^^V/ an< ^ 
let 1 < i < k. If I is an interpolant for — i ( l//,- A m) then <f> =>• 7. 

Proof. I is an interpolant for ->( Yi A m) or, equivalently, for y/i ~^ m - Therefore \f/j 7. Since (j) =>• 
we also have (j) =>■ 7. □ 

Algorithm [Upresents a simple method that makes use of this Lemma to solve a decomposed formula. 
First, it extracts a model m for G, which is over the shared variables of the decomposition (the globals). 
It then attempts to extend the model to models satisfying each of the partitions and returns T if this was 
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successful. Otherwise, it extracts an interpolant / from every unsatisfiable partition which is subsequently 
used to refine G. When G is found to be unsatisfiable the algorithm returns _L as there cannot be any 
model that is extensible to all partitions. 

The maximum number of iterations require by Algorithm[T]depends on the number and the domain of 
the shared variables in the decomposition. Of course, this motivates the use of decomposition techniques 
that find partitions with little overlap and on the other hand, motivates the use of interpolation techniques 
that produce (logically) weak interpolants such that every interpolant covers as many (global) models as 
possible. 

Theorem 2. Algorithm\J\is sound. 

Proof. When the algorithm returns T, there is a model m for G which has an extension in every l//,-. 
Conversely, if the algorithm returns _L, then every potential model m is contained in some interpolant 
which implies -<m, which is an immediate consequence of Lemma [T] □ 

Theorem 3. Algorithm\l\is complete for formulas over finite-domain variables. 

Proof. Every iteration of the algorithm excludes at least one possible model from G (otherwise the 
algorithm would terminate and return T). For formulas over finite-domain variables there is only a finite 
number of potential variables. Therefore, G has to become unsatisfiable at some point, forcing G = L 
and therefore termination of the algorithm. □ 



5 Interpolation and Conflict Clauses 

The DPLL procedure is an algorithm that solves the SAT problem (see, e.g., ESI ). It does so by evaluat- 
ing a series of partial assignments until a total assignment is found. When a partial assignment is found 
to be inconsistent with the input formula, DPLL backtracks to a previous (smaller) assignment. Modern 
incarnations of this algorithm use conflict-driven backtracking, which means that the conflicting state of 
the solver is analyzed and a conflict clause is derived. It is required that every conflict clause be implied 
by the original formula, that it is over the variables of the current assignment, and that it be inconsistent 
with the current assignment. Any conflict clause is therefore redundant, but it may help to prevent further 
conflicts when it is kept in the clause database (in which case it is called a learnt clause). We think it 
worthwile to characterize the relationship between conflict clauses and interpolants: 

Corollary 1. Every conflict clause for a propositional formula </> derived under the partial assignment 
a is an interpolant for ->a. 

Proof. According to the definition of a conflict clause C, it must be implied by and inconsistent with 
a. We therefore have <p => C and -i(CAa) = C=> -<a, which makes C an interpolant for =^ ->a by 
Theorem [T] □ 

Currently, the most popular conflict resolution scheme for DPLL-style solvers is the so-called First- 
UIP method (for a definition see Q). The corollary stated above raises the question whether other 
interpolation methods are able to improve upon this scheme. Note that the First-UIP scheme has some 
properties which make it very efficient in practice: 

• a conflict clause can be computed in linear time and 

• every such clause is asserting, i.e., it contains a unique literal which is unassigned after backtrack- 
ing. 
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More general interpolation schemes like McMillan's or HKP also have the first of these properties 
since interpolants are usually computed in linear time from a resolution proof. Therefore, an interpolant 
may be computed in linear time, too, if the resolution proof of the current conflict is kept in the state of 
the solver. This, however, is much more expensive than keeping the reasons for implications as is done 
in the First-UIP scheme. 

An interpolant is generally not of clause form. If it is to be kept as part of the problem (akin to a 
learnt clause) it therefore requires conversion. The straight-forward expansion to CNF may increase the 
size of the interpolant exponentially. The Tseitin transformation 11331 increases the size of the interpolant 
only linearly, but introduces new variables. It is not clear which of these methods is to be preferred. In 
general, however, a set of conflict clauses is produced, instead of a single clause like in the First-UIP 
scheme. 

An interpolant is also asserting in the sense that it is asserted to be true; however, it is not immedi- 
ately asserting a specific literal like a First-UIP conflict clause. A preliminary experimental evaluation 
(of which the details are omitted) has shown that none of the known propositional interpolation meth- 
ods performs better than the First-UIP scheme. This, however, may be due to the lack of an efficient 
interpolation algorithm that matches the performance of the algorithm for First-UIP conflict resolution. 

6 Experimental Evaluation 

As a first step in evaluating our algorithm, we implemented a propositional satisfiability solver based on 
the MiniS AT solver. We restrict ourselves to the slightly outdated version 1.14p, because propositional 
interpolation methods require proof production, which is not available in more recent versions of Min- 
iS AT |[T4l . Interpolants are produced by iterating over the resolution proof, which is saved (explicitly) 
in memory. We use Reduced Boolean Circuits (RBCs HI) to represent interpolants such that recurring 
structure is exploited. Furthermore, Algorithm Q] permits the exploitation of state-of-the-art SAT solver 
technology, like incremental solving techniques in solving partitions. Furthermore, every assignment to 
the globals is a set of clauses of size 1 , which means that facilities for solving a formula under assump- 
tions may be made use of. The lazy decomposition used by our implementation is indeed quite trivial: it 
simply divides the clauses of the problem into a predefined number p of equally sized partitions. Clauses 
are ordered as they appear in the input file and each partition i is assigned the clauses numbered from 
i • j to (i + 1) • ~, where n is the total number of clauses. 

Our implementation is evaluated on set of formulas which are small but hard to solve |2"fl They are 
known to contain symmetries, which potentially can be exploited by interpolation. For this evaluation, 
our implementation uses only a single processing element, i.e., the evaluation of the partitions of a 
decomposition is sequentialized. Through this, we are able to show that our algorithm performs well 
even when using the same resources as a traditional solver. Preliminary experiments have shown that an 
actual (shared-memory) parallelization of our algorithm performs better than the sequentialized version, 
but not significantly so, which is due to the lack of a load-balancing mechanism to balance the runtime 
of partition evaluation. 

All our experiments are executed on a Windows HPC cluster of dual Quad-Xeon 2.5 GHz processors 
with 16 GB of memory each, using a timeout of 3600 seconds and a memory limit of 2 GB. 

To assess the impact of the decomposition on the solver performance, we investigate all decomposi- 
tions into 2 to 50 partitions for each of the three interpolation methods (McMillan's, Dual McMillan's 
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(a) McMillan interpolants 
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Figure 1: Decomposition into {2,... ,50} partitions, using different interpolation systems. 
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Table 1: Runtime comparison with MiniSAT (versions 2.2.0 and 1.14p), using McMillan's interpola- 
tion system. Bold numbers indicate the smallest runtime in the decompositions shown here or the best 
decomposition into {2, . . . ,50} partitions (right-most column). 
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and HKP). Each line in Figures [Taj [TbJ andflclcorresponds to one benchmark file and indicates the change 
in runtime required to solve the benchmark when using from 2 up to 50 partitions in the decomposition. 
These figures provide strong evidence for an improvement of the runtime behavior as the number of par- 
titions increase. Note that, as mentioned before, the evaluation of partitions was sequentialized for this 
experiment, i.e., this effect is not due to an increasing number of resources being utilized. The graphs in 
Figures [Taj [TbJ and [Tcjpro vide equally strong evidence for the utility of McMillan's interpolation system: 
the impact on the runtime is the largest and most consistent of all three interpolation systems. 

Finally, Table [JJ provides a comparison of the runtime of MiniSAT versions 2.2.0 and 1.14p with 
a selection of different decompositions, the right-most column indicating the time of the best decom- 
position found among all those evaluated. It is clear from this table that no single partitioning can be 
identified as the best overall method. However, some decompositions, like the one into 50 partitions, 
perform consistently well and almost always better than either versions of MiniSAT. 

7 Conclusion 

We present the concept of lazy distribution for first-order decision procedures. Formulas are decomposed 
into partitions without the need for quantifier elimination or any other method for logical disconnection 
of the partitions. Instead, local models for the partitions are reconciled globally through the use of 
Craig interpolation. Experiments using different interpolation systems and decompositions for preposi- 
tional formulas indicate that our approach performs better than traditional solving methods even when 
sequentialized, i.e., when no additional resources are used. At the same time, our algorithm provides 
straight-forward opportunities for parallelization and distribution of the solving process. 
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